Search CORE

708 research outputs found

Spectral classification of short numerical exon and intron sequences

Author: Benjamin YM Kwan
D Blackenberg
D Karolchik
Hon Keung Kwan
J Goecks
Jennifer YY Kwan
JYY Kwan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

This research presents three new numerical representations for classifying short exon and intron sequences using discrete Fourier transform period-3 value. Based on the human genome, results indicate that the Complex Twin-Pair representation is attractive compared with other numerical representations and the approach has potential applications in genome annotation and read mapping

Crossref

Springer - Publisher Connector

PubMed Central

Integrating diverse databases into an unified analysis framework: a Galaxy approach

Author: A. Nekrutenko
Blankenberg
Bock
D. Blankenberg
G. Von Kuster
Giardine
Hawkins
J. Taylor
Karolchik
Lyne
N. Coraor
Publication venue: Oxford University Press
Publication date
Field of study

Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources

Crossref

PubMed Central

BigWig and BigBed: enabling browsing of large distributed datasets

Author: A. S. Hinrichs
A. S. Zweig
Alekseyenko
D. Karolchik
G. Barber
Guttman
Kent
Kent
Li
Rhead
W. J. Kent
Publication venue: Oxford University Press
Publication date
Field of study

Summary: BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets

Crossref

PubMed Central

HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants

Author: Adzhubei
Andersen
Berger
Chen
Davydov
Durbin
Ernst
Han
Karolchik
L. D. Ward
Lander
M. Kellis
Matys
McCarthy
Ng
Nicolae
Pohlmann
Sherry
Touzet
Yue
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/10/2011
Field of study

The resolution of genome-wide association studies (GWAS) is limited by the linkage disequilibrium (LD) structure of the population being studied. Selecting the most likely causal variants within an LD block is relatively straightforward within coding sequence, but is more difficult when all variants are intergenic. Predicting functional non-coding sequence has been recently facilitated by the availability of conservation and epigenomic information. We present HaploReg, a tool for exploring annotations of the non-coding genome among the results of published GWAS or novel sets of variants. Using LD information from the 1000 Genomes Project, linked SNPs and small indels can be visualized along with their predicted chromatin state in nine cell types, conservation across mammals and their effect on regulatory motifs. Sets of SNPs, such as those resulting from GWAS, are analyzed for an enrichment of cell type-specific enhancers. HaploReg will be useful to researchers developing mechanistic hypotheses of the impact of non-coding variants on clinical phenotypes and normal variation. The HaploReg database is available at http://compbio.mit.edu/HaploReg.National Institutes of Health (U.S.) (R01-HG004037)National Institutes of Health (U.S.) (RC1-HG005334)National Science Foundation (U.S.) (HG005334

DSpace@MIT

Crossref

The UCSC Genome Browser database: update 2010

Author: A. Pohl
A. S. Hinrichs
A. S. Zweig
Austin
B. Giardine
B. J. Raney
B. Rhead
Berman
Blanchette
D. Haussler
D. Karolchik
F. Hsu
Feuk
G. P. Barber
H. Clawson
Hsu
Iafrate
J. Hillman-Jackson
Jain
K. E. Smith
K. Learned
K. R. Rosenbloom
Kaiser
Karolchik
Karolchik
Kent
L. R. Meyer
M. Diekhans
M. Pheasant
Nord
P. A. Fujita
Pettersen
R. A. Harte
R. M. Kuhn
Sherry
T. R. Dreszer
The ENCODE Project Consortium
The MGC Project Team
W. J. Kent
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users’ own annotation data. As of September 2009, genomic sequence and a basic set of annotation ‘tracks’ are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools

CiteSeerX

Crossref

PubMed Central

University of Queensland eSpace

Dietary soy and meat proteins induce distinct physiological and gene expression changes in rats

Author: A Hagiwara
A Lass
A Shukla
A Subramanian
AM Uhe
B He
BB Albert
C Brandsch
C Chaveroux
CJ Andersen
CW Law
D Eberle
D El Khoury
D Gaidatzis
D Karolchik
D Merico
D Ord
D Tome
DW Gietzen
E Tornberg
G Fromentin
G Sarwar
H Sidransky
HX Yuan
J Cacho
J Schwarz
JI Lee
JL Jewell
JM Beasley
KK Carroll
L Abatangelo
L Noriega-Lopez
LE Matarese
M Friedman
M Shimobayashi
MD Robinson
MD Robinson
ME Ritchie
ML Orgeron
MS Kilberg
N Tachibana
PG Reeves
PM Leung
PM Pereira
R Bourgon
R Hoffenberg
R Hosomi
S Guo
S Madani
S Wen
SM Potter
SN Twigger
T Berry
TG Anthony
V Ranawana
W Gade
W Kang
Y Takahashi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This study reports on a comprehensive comparison of the effects of soy and meat proteins given at the recommended level on physiological markers of metabolic syndrome and the hepatic transcriptome. Male rats were fed semi-synthetic diets for 1 wk that differed only regarding protein source, with casein serving as reference. Body weight gain and adipose tissue mass were significantly reduced by soy but not meat proteins. The insulin resistance index was improved by soy, and to a lesser extent by meat proteins. Liver triacylglycerol contents were reduced by both protein sources, which coincided with increased plasma triacylglycerol concentrations. Both soy and meat proteins changed plasma amino acid patterns. The expression of 1571 and 1369 genes were altered by soy and meat proteins respectively. Functional classification revealed that lipid, energy and amino acid metabolic pathways, as well as insulin signaling pathways were regulated differently by soy and meat proteins. Several transcriptional regulators, including NFE2L2, ATF4, Srebf1 and Rictor were identified as potential key upstream regulators. These results suggest that soy and meat proteins induce distinct physiological and gene expression responses in rats and provide novel evidence and suggestions for the health effects of different protein sources in human diets

Crossref

PubMed Central

Wageningen University & Research Publications

University of East Anglia digital repository

The UCSC Genome Browser Database: update 2009

Author: A. Pohl
A. S. Hinrichs
A. S. Zweig
B. Giardine
B. J. Raney
B. Rhead
Bellen
Blanchette
D. Haussler
D. Karolchik
F. Hsu
G. P. Barber
H. Clawson
Hinrichs
Hsu
Iafrate
K. E. Smith
K. R. Rosenbloom
Karolchik
Karolchik
Kent
L. Meyer
M. Diekhans
M. Pheasant
Mattes
Nord
P. Fujita
R. A. Harte
R. M. Kuhn
Sherry
T. Dreszer
T. Wang
The ENCODE Project Consortium
The MGC Project Team
W. J. Kent
Yang
Zhu
Publication venue: Oxford University Press
Publication date
Field of study

The UCSC Genome Browser Database (GBD, http://genome.ucsc.edu) is a publicly available collection of genome assembly sequence data and integrated annotations for a large number of organisms, including extensive comparative-genomic resources. In the past year, 13 new genome assemblies have been added, including two important primate species, orangutan and marmoset, bringing the total to 46 assemblies for 24 different vertebrates and 39 assemblies for 22 different invertebrate animals. The GBD datasets may be viewed graphically with the UCSC Genome Browser, which uses a coordinate-based display system allowing users to juxtapose a wide variety of data. These data include all mRNAs from GenBank mapped to all organisms, RefSeq alignments, gene predictions, regulatory elements, gene expression data, repeats, SNPs and other variation data, as well as pairwise and multiple-genome alignments. A variety of other bioinformatics tools are also provided, including BLAT, the Table Browser, the Gene Sorter, the Proteome Browser, VisiGene and Genome Graphs

CiteSeerX

Crossref

PubMed Central

The UCSC Archaeal Genome Browser: 2012 update

Author: A. D. Holmes
A. M. Smith
Altschul
Besemer
Blanchette
Bland
Chan
D. Tran
Goecks
Hale
Hyatt
Kanehisa
Karolchik
Makarova
MARCK
P. P. Chan
Price
Randau
Schneider
Siepel
Siguier
T. M. Lowe
Tatusov
Tatusov
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

The UCSC Archaeal Genome Browser (http://archaea.ucsc.edu) offers a graphical web-based resource for exploration and discovery within archaeal and other selected microbial genomes. By bringing together existing gene annotations, gene expression data, multiple-genome alignments, pre-computed sequence comparisons and other specialized analysis tracks, the genome browser is a powerful aggregator of varied genomic information. The genome browser environment maintains the current look-and-feel of the vertebrate UCSC Genome Browser, but also integrates archaeal and bacterial-specific tracks with a few graphic display enhancements. The browser currently contains 115 archaeal genomes, plus 31 genomes of viruses known to infect archaea. Some of the recently developed or enhanced tracks visualize data from published high-throughput RNA-sequencing studies, the NCBI Conserved Domain Database, sequences from pre-genome sequencing studies, predicted gene boundaries from three different protein gene prediction algorithms, tRNAscan-SE gene predictions with RNA secondary structures and CRISPR locus predictions. We have also developed a companion resource, the Archaeal COG Browser, to provide better search and display of arCOG gene function classifications, including their phylogenetic distribution among available archaeal genomes

CiteSeerX

Crossref

PubMed Central

Patterns of Evolution and Host Gene Mimicry in Influenza and Other RNA Viruses

Author: A Fadiel
Arnold J. Levine
Benjamin D. Greenbaum
BK Rima
CL Washenberger
CY Cheung
D Karolchik
D Karolchik
D Kobasa
DM Klinman
E Beutler
E Scarano
Edward C. Holmes
F De Amicis
F Takeshita
G Bernardi
G Bernardi
G Shaw
Gyan Bhanot
H Robins
IB Dawid
J Josse
J Sewatanon
K Jabbari
LA Shackelton
LR Cardon
MC Chan
MN Swartz
P Auewarakul
Q Yu
R Nussinov
R Rabadan
R Suspese
Raul Rabadan
RK Holmes
S Agrawal
S Hughes
S Karlin
S Karlin
T Sugiyama
T Vider-Shalit
V Hornung
W Salser
Y Wang
Publication venue: Public Library of Science
Publication date: 01/06/2008
Field of study

It is well known that the dinucleotide CpG is under-represented in the genomic DNA of many vertebrates. This is commonly thought to be due to the methylation of cytosine residues in this dinucleotide and the corresponding high rate of deamination of 5-methycytosine, which lowers the frequency of this dinucleotide in DNA. Surprisingly, many single-stranded RNA viruses that replicate in these vertebrate hosts also have a very low presence of CpG dinucleotides in their genomes. Viruses are obligate intracellular parasites and the evolution of a virus is inexorably linked to the nature and fate of its host. One therefore expects that virus and host genomes should have common features. In this work, we compare evolutionary patterns in the genomes of ssRNA viruses and their hosts. In particular, we have analyzed dinucleotide patterns and found that the same patterns are pervasively over- or under-represented in many RNA viruses and their hosts suggesting that many RNA viruses evolve by mimicking some of the features of their host's genes (DNA) and likely also their corresponding mRNAs. When a virus crosses a species barrier into a different host, the pressure to replicate, survive and adapt, leaves a footprint in dinucleotide frequencies. For instance, since human genes seem to be under higher pressure to eliminate CpG dinucleotide motifs than avian genes, this pressure might be reflected in the genomes of human viruses (DNA and RNA viruses) when compared to those of the same viruses replicating in avian hosts. To test this idea we have analyzed the evolution of the influenza virus since 1918. We find that the influenza A virus, which originated from an avian reservoir and has been replicating in humans over many generations, evolves in a direction strongly selected to reduce the frequency of CpG dinucleotides in its genome. Consistent with this observation, we find that the influenza B virus, which has spent much more time in the human population, has adapted to its human host and exhibits an extremely low CpG dinucleotide content. We believe that these observations directly show that the evolution of RNA viral genomes can be shaped by pressures observed in the host genome. As a possible explanation, we suggest that the strong selection pressures acting on these RNA viruses are most likely related to the innate immune response and to nucleotide motifs in the host DNA and RNAs

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

Author: A Auton
A Kong
A Navarro
A Necşulea
A Ratnakumar
A Siepel
Adam Siepel
AJ Jeffreys
AJ Webb
AP Boyle
BC Lamb
C Kosiol
CC Spencer
CF Mugal
D Karolchik
D Kostka
Dennis Kostka
E Mancera
G Marais
Graham Coop
J Berglund
J Harrow
J Romiguier
JA Capra
JM Chen
John A. Capra
JW IJdo
K Lindblad-Toh
K Pollard
Katherine S. Pollard
L Arbiza
L Duret
L Duret
LR Meyer
M Blanchette
M Hasegawa
Melissa J. Hubisz
MJ Hubisz
N Galtier
N Galtier
N Lartillot
P Flicek
P Stenson
RD George
S Glémin
S Katzman
S Katzman
S Myers
S Myers
SE Ptak
ST Sherry
T Nagylaki
TC Brown
TR Dreszer
W Winckler
Y Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. © 2013 Capra et al

arXiv.org e-Print Archive

Crossref

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

D-Scholarship@Pitt

FigShare